A Faster Algorithm for Approximate String Matching

نویسندگان

Ricardo A. Baeza-Yates

Gonzalo Navarro

چکیده

We present a new algorithm for on line approximate string matching The algorithm is based on the simulation of a non deterministic nite automaton built from the pattern and using the text as input This simulation uses bit operations on a RAM machine with word length O log n being n the maximum size of the text The running time achieved is O n for small patterns i e m O p log n independently of the maximum number of errors allowed k This algorithm is then used to design two general algorithms One of them partitions the problem into subproblems while the other partitions the automaton into subautomata These algorithms are combined to obtain a hybrid algorithm which on average is O n for moderate k m ratios O p mk log n n for medium ratios and O m k kn log n for large ratios We show experimentally that this hybrid algorithm is faster than previous ones for moderate size patterns which is the case in text searching

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Filters for Approximate String Matching

We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experiment...

متن کامل

Approximate String Matching with Reduced Alphabet

We present a method to speed up approximate string matching by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster tha...

متن کامل

A Fast Algorithm for Approximate String Matching on Gene Sequences

Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...

متن کامل

LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

Motivation: Approximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded SmithWaterman algorithm but suffers from support of a limited selection of scoring schemes. In this p...

متن کامل

Approximate Boyer-Moore String Matching

The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...

متن کامل

Improved Single and Multiple Approximate String Matching

We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough `-grams from text windows so as to prove that no occurrence can contain the part of the window read, and then shifting the window. Three variants of the algorithm are presented, which give different tradeoffs between how much they work in the window and how much they shift it. We show an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1996

A Faster Algorithm for Approximate String Matching

نویسندگان

چکیده

منابع مشابه

Faster Filters for Approximate String Matching

Approximate String Matching with Reduced Alphabet

A Fast Algorithm for Approximate String Matching on Gene Sequences

LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

Approximate Boyer-Moore String Matching

Improved Single and Multiple Approximate String Matching

عنوان ژورنال:

اشتراک گذاری